Lexical Acquisition for Clinical Text Mining Using Distributional Similarity

نویسندگان

  • John A. Carroll
  • Rob Koeling
  • Shivani Puri
چکیده

We describe experiments into the use of distributional similarity for acquiring lexical information from clinical free text, in particular notes typed by primary care physicians (general practitioners). We also present a novel approach to lexical acquisition from ‘sensitive’ text, which does not require the text to be manually anonymised – a very expensive process – and therefore allows much larger datasets to be used than would normally be possible.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Distributional Inclusion Hypotheses and Lexical Entailment

This paper suggests refinements for the Distributional Similarity Hypothesis. Our proposed hypotheses relate the distributional behavior of pairs of words to lexical entailment – a tighter notion of semantic similarity that is required by many NLP applications. To automatically explore the validity of the defined hypotheses we developed an inclusion testing algorithm for characteristic features...

متن کامل

A Compositional Perspective in Convolution Kernels

Kernel-based learning has been largely adopted in many semantic textual inference tasks. In particular, Tree Kernels (TKs) have been successfully applied in the modeling of syntactic similarity between linguistic instances in Question Answering or Information Extraction tasks. At the same time, lexical semantic information has been studied through the adoption of the so-called Distributional Se...

متن کامل

Topic Models for Meaning Similarity in Context

Recent work on distributional methods for similarity focuses on using the context in which a target word occurs to derive context-sensitive similarity computations. In this paper we present a method for computing similarity which builds vector representations for words in context by modeling senses as latent variables in a large corpus. We apply this to the Lexical Substitution Task and we show...

متن کامل

Using Distributional Similarity to Identify Individual Verb Choice

Human text is characterised by the individual lexical choices of a specific author. Significant variations exist between authors. In contrast, natural language generation systems normally produce uniform texts. In this paper we apply distributional similarity measures to help verb choice in a natural language generation system which tries to generate text similar to individual author. By using ...

متن کامل

Augmenting Approximate Similarity Searching with Lexical Information

Accurately representing synonymy using distributional similarity requires large volumes of data to reliably represent infrequent words. However, the naı̈ve nearest-neighbour approach to compare context vectors extracted from large corpora scales poorly. The Spatial Approximation Sample Hierarchy (SASH) is a data-structure for performing approximate nearest-neighbour queries, and has been previou...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012